Conversation
Signed-off-by: Malay Nagda <malayn@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Signed-off-by: Malay Nagda <malayn@nvidia.com>
ko3n1g
left a comment
There was a problem hiding this comment.
Should we just refactor this to only use eval window instead of skip_n_steps? I'm wondering if there will be a case in future where we still need skip_n_steps?
Wouldn't it be useful in case of variable number of steps like for convergence testing or when the set number of steps were not completed due to time limit or some crash- skip_n_percent can still calculate based on available steps? |
📝 WalkthroughWalkthroughThese changes introduce configurable timing window boundaries for performance evaluation. Two new CLI arguments ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
scripts/performance/utils/evaluate.py (2)
684-689: Duplicated window calculation logic.The
start/endcalculation at lines 684-687 duplicates the logic fromvalidate_performance(lines 328-331). Consider extracting a helper function to ensure consistent behavior and reduce maintenance burden.♻️ Proposed helper function extraction
Add a helper function at module level:
def _get_eval_window(config: Dict[str, Any], num_steps: int) -> tuple[int, int | None]: """Compute (start, end) indices for the evaluation window. Args: config: Performance config dict with optional eval_time_start_step, eval_time_end_step, and skip_first_percent_time keys. num_steps: Total number of steps. Returns: Tuple of (start_index, end_index) where end_index may be None. """ start = config.get("eval_time_start_step") if start is None: start = max(1, int(num_steps * config.get("skip_first_percent_time", 0.1))) end = config.get("eval_time_end_step") return start, endThen use it in both locations:
- start = config.get("eval_time_start_step") - if start is None: - start = max(1, int(len(steps) * config["skip_first_percent_time"])) - end = config.get("eval_time_end_step") + start, end = _get_eval_window(config, len(steps)) current_stable = current_gpu_util_values[start:end]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/utils/evaluate.py` around lines 684 - 689, Extract the duplicated start/end window logic into a module-level helper function (e.g., _get_eval_window(config: Dict[str, Any], num_steps: int) -> tuple[int, Optional[int]]) that implements the same behavior as the duplicated blocks (use config.get("eval_time_start_step"), fallback to max(1, int(num_steps * config.get("skip_first_percent_time", 0.1))), and return eval_time_end_step as end or None); then replace the duplicated logic in evaluate.py (the block computing start/end before computing current_avg_iter_time_ms and golden_avg_iter_time_ms) and the logic inside validate_performance with calls to _get_eval_window(performance_config, len(steps)) to ensure consistent behavior.
328-333: Consider validatingstartandendbounds.The current implementation doesn't validate that:
startandendare non-negative (negative indices have different Python slice semantics)start < end(empty slice would causenp.nanmeanto returnnan)While unlikely in practice, invalid CLI inputs could produce confusing behavior.
🛡️ Optional: Add bounds validation
start = config.get("eval_time_start_step") if start is None: start = max(1, int(len(steps) * config["skip_first_percent_time"])) + elif start < 0: + raise ValueError(f"eval_time_start_step must be non-negative, got {start}") end = config.get("eval_time_end_step") + if end is not None and end < 0: + raise ValueError(f"eval_time_end_step must be non-negative, got {end}") + if end is not None and start >= end: + raise ValueError(f"eval_time_start_step ({start}) must be less than eval_time_end_step ({end})") current_stable = current_gpu_util_values[start:end] golden_stable = golden_gpu_util_values[start:end]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/performance/utils/evaluate.py` around lines 328 - 333, Validate and clamp the computed start and end before slicing current_gpu_util_values and golden_gpu_util_values: ensure start and end are integers >= 0, clamp them to the valid range (0 .. len(steps)), and check start < end (raise a ValueError or return a clear error) so you don't produce negative-index slices or empty ranges; update the logic around config, start, end, steps, current_gpu_util_values and golden_gpu_util_values to perform these checks and fail fast with a clear message if inputs are invalid.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/performance/utils/evaluate.py`:
- Around line 684-689: Extract the duplicated start/end window logic into a
module-level helper function (e.g., _get_eval_window(config: Dict[str, Any],
num_steps: int) -> tuple[int, Optional[int]]) that implements the same behavior
as the duplicated blocks (use config.get("eval_time_start_step"), fallback to
max(1, int(num_steps * config.get("skip_first_percent_time", 0.1))), and return
eval_time_end_step as end or None); then replace the duplicated logic in
evaluate.py (the block computing start/end before computing
current_avg_iter_time_ms and golden_avg_iter_time_ms) and the logic inside
validate_performance with calls to _get_eval_window(performance_config,
len(steps)) to ensure consistent behavior.
- Around line 328-333: Validate and clamp the computed start and end before
slicing current_gpu_util_values and golden_gpu_util_values: ensure start and end
are integers >= 0, clamp them to the valid range (0 .. len(steps)), and check
start < end (raise a ValueError or return a clear error) so you don't produce
negative-index slices or empty ranges; update the logic around config, start,
end, steps, current_gpu_util_values and golden_gpu_util_values to perform these
checks and fail fast with a clear message if inputs are invalid.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e1756638-dd38-4019-891b-ec4db2365970
📒 Files selected for processing (3)
scripts/performance/argument_parser.pyscripts/performance/setup_experiment.pyscripts/performance/utils/evaluate.py
What does this PR do ?
Evaluate perf between given boundaries.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit
--eval_time_start_stepand--eval_time_end_stepconfiguration options for performance evaluation. These parameters enable users to specify precise evaluation windows for timing averages, providing more flexible control over which test steps are included in performance analysis.